scaled-down text-to-image model
OpenAI Releases GLIDE: A Scaled-Down Text-to-Image Model That Rivals DALL-E Performance
Text-to-image generation has been one of the most active and exciting AI fields of 2021. In January, OpenAI introduced DALL-E, a 12-billion parameter version of the company's GPT-3 transformer language model designed to generate photorealistic images using text captions as prompts. An instant hit in the AI community, DALL-E's stunning performance also attracted widespread mainstream media coverage. Last month, tech giant NVIDIA released the GAN-based GauGAN2 -- the name taking inspiration from French Post-Impressionist painter Paul Gauguin as DALL-E had from Surrealist artist Salvador Dali. Not to be outdone, OpenAI researchers this week presented GLIDE (Guided Language-to-Image Diffusion for Generation and Editing), a diffusion model that achieves performance competitive with DALL-E while using less than one-third of the parameters.